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(54) Video information retrieval 

(57) A video information retrieval system comprises 
a client system having: means for issuing a search re- 
quest in respect of desired video material; and means 
for accessing video material on the basis of a uniform 
resource locator (URL) and a SMPTE unique material 
Identifier (UMID); a server system having: access to one 
or more databases containing metadata infomnation re- 
lating to a plurality of video material items, a UMID as- 
sociated with each video material item and at least one 
URL associated with each UMID; means for receiving a 
search request from the client system and detecting one 



or more video material items for which metadata infor- 
mation stored in at least one of the database(s) substan- 
tially corresponds to the search request; means for sup- 
plying the metadata infomiation, the URL and the UMID 
relating to the one or more detected video material items 
to the client system; and at least one video repository 
having: a video storage arrangement storing video ma- 
terial and associated UMID data; in which the metadata, 
the URL and the UMID are communicated between the 
server and the client using a markup language having 
descriptors for data content. 
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Description 

[0001] This invention relates to video infomnation re- 
trieval. 

[0002] Video images are a useful resource for enter- 
tainment and for dissemination of information. Digital 
video images are also increasingly being used in a'wide 
range of multimedia applications. 
[0003] The sheer volume of video information current- 
ly available to the user is overwhelming with the exist- 
ence of many video libraries and archives each of which 
potentially stores millions of images. These video ar- 
chives have a broad spectrum of users running different 
applications and requiring a range of services from pro- 
vision of subject-specific video clips for editing purposes 
to video on demand, in practical temrjs the video archive 
environment must allow users to run custom applica- 
tions which utilise a common database of video images 
and provide descriptive data related to the video images 
to allow the user to make an infonned choice of which 
media file to download. The generic term for the descrip- 
tive data associated with video images is metadata. 
[0004] Computer database management systems 
have proved to be very effective for organising text and 
numeric data. The most widespread database manage- 
ment systems are known as "relational" databases. 
These systems collect data and organise it as a set of 
formally described tables from which data can be ac- 
cessed selectively and reassembled in a variety of ways 
without having to reorganise the data tables. The stand- 
ard user and application program interface (API) to a 
relational database is the structured query language 
(SQL) which can be used for simple interactive queries 
as well as for more extensive data gathering for use in 
compiling reports. 

[0005] A further example of an information manage- 
ment system is a web search engine. The web search 
engine is ideally suited for use in a multimedia environ- 
ment and has three basic components: 

■ A program known as a "spider" that goes to every 
page or representative pages on every web site that 
wants to be searchable and reads it, using hypertext 
links on each page to discover and read a site's oth- 
er pages. 

■ A program that creates a master index from the pag- 
- es that have been read. 

■ A program that receives a user's text-based search 
request, compares it to the entries in the master In- 
dex, and retums results to the user. 

[0006] Video archives are of very limited value to the 
user unless there is an information management system 
for Images capable of delivering images based on their 
specific content. This video information management 
system is likely to require features used in database 
management systems as well as some of the function- 
ality of the web search engine. One difficulty is that im- 



age and video data require a much higher bandwidth 
than text-based information. Downloading a video clip 
across a computer network can be very time consuming 
because of the large quantity of data involved. In some 

5 cases the user may have to download and view several 
video clips in real time in order to find a clip with the 
required information content. Thus it is very important 
to provide the user with adequate information about im- 
ages in the archives prior to any download to increase 

10 the likelihood of the downloaded images meeting the us- 
er-specific requirements. Some users may be looking 
for video clips that can be used to illustrate a particular 
feature or issue, for example, video segments showing 
a particular politician or dignitary. Other users might be 

15 searching for complete programmes and news items re- 
lated to a specific topic such as global warming. It would 
also be advantageous to the user to have unrestricted 
access to as many video archives as possible via a sin- 
gle video-specific search query. 

20 [0007] A typical prior-art video information retrieval 
system for use on the worid-wide web is illustrated in 
Figure 1 . Video source material 10 is input as raw video 
information 15 to an .encoding and content-analysis 
module 20. The source material could be a digital or an- 

25 alogue video-cassette, an electronically stored digital 
video file or a broadcast signal fed directly via satellite 
The encoding and content-analysis module 20 takes the 
video source material and produces digital copies it in 
various alternative fomriats ranging from low bit-rate ver- 

30 sions suitable for use on Internet browser plug-ins such 
as RealVideo^" to high bit-rate broadcast quality 
MPEG2 images. 

[0008] On input to the video archive system the ana- 
logue or digital source material is subject to an automat- 
es ed content-analysis process. This typically involves the 
use of local intensity histograms, edge histograms, ge- 
ometrical shape analysis, face detection and on-screen 
text extraction to establish and log the content of each 
image. The associated audio samples may be proc- 
40 essed for content using speech detection algorithms. 
Proprietary content-analysis software such as Virage's 
Video logger^*^ has been be used for this purpose. The 
result is a video index 25 which summarises the content 
of the video material. 
45 [0009] A video applk^ation server 30 stores the video 
index 25 in an appropriate format so that it is accessible 
to a web server 40. In addition the video application 
sender 30 provides a flexible template system, handles 
client-queries and provides administration tools. Clients 
50 60 running Internet browsers have access to the video 
index via the web server 40. The clients may enter 
search terms in a standard web search engine which is 
interfaced the video index so that video material can be 
selectively retrieved on the basis of its logged content. 
55 [0010] The encoding and content-analysis module 20 
outputs the digital video infomnation 65 across a distri- 
bution network. The digital video infomnation 65 is avail- 
able for download to the clients via a video server 50. 
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The video index 25 Is used to search for and retrieve 
particular video clips required by users. 
[001 1 ] The Invention provides a video Information re- 
trieval system comprising: 

a client system having: 

means for issuing a search request in respect 
of desired video material; and 
means for accessing video material on the ba- 
sis of a uniform resource locator (URL) and a 
SMPTE unique material identifier (UMID); 

a server system having: 

access to one or more databases containing 
metadata Information relating to a plurality of 
video material items, a UMID associated with 
each video material item and at least one URL 
associated with each UMID; 
means for receiving a search request from the 
client system and detecting one or more video 
material items for which metadata information 
stored in at least one of the database(s) sub- 
stantially corresponds to the search request; 
means for supplying the metadata information, 
the URL and the UMID relating to the one or 
mo re detected video material items to the client 
system; 

and at least one video repository having: 

a video storage arrangement storing video ma- 
teria! and associated UMID data; 

In which the metadata, the URL and the UMID are 
communicated between the server and the client using 
a markup language having descriptors for data content. 
[0012] The invexition provides an improved video in- 
formation retrieval system which (a) uses UMIDs to ac- 

- cess video material, thereby providing a unique and 
" platform- (or vendor-) independent index to the video 

- - material, and (b) uses a markup language having de- 
' scriptbrs for data content as the transmission means for 

the search results, which means again that the commu- 
nication required for the video information retrieval sys- 
tem.can potentially be platform- and vendor-independ- 
ent as such markup language files are potentially trans- 
•^^ missible via the generally available http port 80. 

[001 3] Further respective aspects and features of the 
invention are defined in the appended claims. 
[0014] Embodiments of the invention will now be de- 
scribed with reference to the accompanying drawings, 
throughout which like parts are referred to by like refer- 
ences, and in which: 

Figure 1 schematically illustrates a prior art video 
Infomiation retrieval system; 



Figure 2 is a schematic diagram of a video informa- 
tion retrieval system according to an embodiment 
of the present invention; and 
Figures 3 and 4 are schematic examples of the use 
5 Of XML data structures. 

[0015] Referring now to the drawings, Figure 2 is a 
schematic illustration of a video information retrieval 
system according to an embodiment of the invention. A 

fo client 1 GO running a web browser initiates a search re- 
quest 105 specifically directed to video material. The 
search is performed via a web search engine. The 
search engine communicates via a common gateway 
interface (CGI) on a server 110. The search engine con- 

15 verts the client request to a database query 115 and the 
client request is output as a signal 125 to a metadata 
database 130A or, If so required, to a series of databas- 
es (130A, 130B...) distributed across the Intemet. 
[001 6] The main obstacle in attempting to gain access 

20 to remote databases of video material via the Internet is 
that in many cases client and server machines will be 
separated by a firewall ox proxy server. A firewall is a set 
of related programs, located at a network gateway serv- 
er, that protects the resources of a private network from 

25 users of other networks. By working closely with a router 
program, a firewall filters all network packets and de- 
cides whether or not to forward them to their destination. 
A proxy server which makes network requests on behalf 
of users may be included in a firewall or wori< closely 

30 with It. Firewalls are generally able to distinguish one 
protocol from another. In the Transmission Control Pro- 
tocol/ Internet Protocol (TCP/IP) architecture a specific 
port number is assigned to each common protocol and 
each request made using that protocol carries that 

55 number. For example HTTP is assigned to port 80 while 
File Transfer Protocol (FTP) is assigned to port 21, Most 
firewalls allow blocking of a specific protocol by rejecting 
all traffic sent on the port number associated with that 
protocol. Most firewalls are configured to let through traf- 

40 fie on port 80 which is how HTTP requests from brows- 
ers get through. Since each unblocked protocol poses 
a potential security threat, firewalls are generally set up 
to block most ports with the exception of port 80. As shall 
be explained below, the interchainge between the client 

45 and the metadata database according to embodiments 
of the invention, is in a mari<up language that has de- 
scriptors for data content such as XML. Since XML is 
text-based, advantage can be taken of HTTP port 80 to 
deploy an Internet-wide video archive search facility. 

50 HTTP alone would not be sufficient to implement 
searches on remote databases of video material across 
multiple platforms because it lacks a single standard for- 
mat for representing queries. Because XML is a plat- 
form-neutral data representation, it can be used on top 

55 of HTTP to serialise data into a transmissible form that 
is easily decoded on any platform. This is the basis on 
which remote procedure call (RPC) protocols such as 
Microsoft's Simple Object Access Protocol (SOAP™)\ 
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operates. RPCs are specially designed to provide ac- 
cess to computer program objects resident on machines 
that are distributed across the Intemet. 
[0017] in a video retrieval system designed for de- 
ployment across the Internet there will be no central 
management of the video archives, and therefore it is 
very important to be able to uniquely and unambiguous- 
ly identify each video clip that is accessible to the user. 
The metadata database 130 uses the SMPTE UMID to 
relate the stored metadata to the particular video mate- 
rial from which it was generated. 

[0018] The UMID is described in the March 2000 is- 
sue of the "SMPTE Journal". An "extended UMID" com- 
prises a first set of 32 bytes of "basic UMID" and a sec- 
ond set of 32 bytes of "signature metadata**. 
The basic UMID has a key-length-value (KLV) structure 
and it comprises: 

■ A 12-byte Universal Label or key which identifies 
the SMPTE UMID itself, the type of material to 
which the UMID refers. It also defines the methods 
by which the globally unique Material and locally 
unique Instance numbers (defined below) are cre- 
ated. 

■ A 1 -byte length value which specifies the length of 
the remaining part of the UMID. 

■ A 3-byte Instance number used to distinguish be- 
tween different "instances" or copies of material 
with the same Material number. 

■ A 16-byte Material number used to identify each 
clip. A Material number is provided at least for each 
shot and potentially for each image frame. 

The signature metadata comprises: 

■ An 8-byte time-date code identifying the time of cre- 
ation of the "Content Unit" to which the UMID ap- 
plies. The first 4-bytes are a Universal Time Code 
(UTC) based oomponent. 

■ A 12-byte value which defines the (GPS derived) 
spatial co-ordinates at the time of Content Unit cre- 
ation. 

■ 3 groups of 4-byte codes wh k:h comprise a country 
code, an organisation code and a user code. 

[001 9] The metadata databases 1 30 contain data de- 
scribing the content of video material. The metadata in- 
cludes location infomiation for the video images to 
which it corresponds, such as a uniform resource locator 
(URL). The URL for a video clip is associated with the 
UMID identifier and an additional timecode can be used 
to obtain partk:ular still images from a given clip. The 
metadata also Includes analysis data from post- 
processing of the image signal such as sub-shot seg- 
mentation infomiation and information about an image 
frame called a representative keystamp (RKS) which 
gives a visual indication of the predominant overall con- 
tents of each shot or sub-shot. 



[0020] Proprietary content-extraction tools such as 
Virago's Videologger^"^ can be used to obtain descrip- 
tive information about the component "objects" in each 
video clip such as people, buildings, cities, the topic or 

5 event to which the clip relates, actors names and details 
of the ownership rights of the footage. The content-index 
for each video clip is stored as metadata. The metadata 
can be stored in the databases 130 in any format. 
[0021] As illustrated in Figure 2, the server 110 re- 

10 sponds to the client search request 105 by returning an 
XML file 155 containing metadata for the video clips 
which match the user's search request. XML is an ex- 
ample of a markup language. Although XML is the pre- 
ferred markup language for interchange of data be- 

15 tween the client and the databases 130, any markup lan- 
guage that has descriptors for data content could be 
used. Markup languages are computer programming 
languages in which document structures are indicated 
in the same stream as the text. Markers like < and > 

20 divide documents into elements and attributes. Ele- 
ments are containers hold that hold content and possibly 
other elements inside them in a hierarchy. Attributes 
provide additional information about a particular ele- 
ment. Elements and attributes are specified by tags en- 

25 closed in < and >. A start tag includes the element name 
and the names and values of the attributes while an end 
tag is marked by a forward-slash character and includes 
only the name of the element corresponding to the start 
tag that it matches. The syntax is as follows: 

30 Start tag: <eiementName attributeName = "attribute- 
Value''> 

text included hem in body of element 
End tag : </eiementName> 

[0022] Hypertext markup language (HTML) is the lan- 

35 guage of the world-wide web and its tags comprise a 
pre-defined and non-extensible set that describe docu- 
ment format i.e. how the contents of a document should 
be displayed. XML has tags which define an information 
structure by describing document content rather than 

40 document format It allows developers to extend the set 
of tags used and to create their own vocabulary for de- 
scribing information. A **schema" is a set of rules that 
describes a given class of XML documents. Hie schema 
defines the elements that can appear and their corre- 

"15 spending attributes. It also defines the hierarchk^al 
structure by specifying which elements are child ele- 
ments of others, the order in which child elements ap- 
pear and the number of child elements. XML is a sim- 
plified subset of its parent markup language, Standard 

50 Generalised Markup Language (SGML). XML is de- 
signed to allow the exchange information between a 
host of different applications running on different types 
of computers without repeated conversion to proprietary 
file fomriats. Although XML is the preferred language, 

55 any extensible markup language with the facility for data 
description tags could be used as a file fomnat for data 
storage In the metastore. 

[0023] An example portion of an XML file that might 
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be used in embodiments of the invention is shown in 
Figure 4. The <media> tag occurs at the top level of the 
hierarchy and contains at the next level down, the 
"metadata objects" element and the "metadata tracks" 
element. The child elements of the metadata objects are 
shown as elements for person, place and topic, each of 
which has an "href attribute. This attribute provides a 
link to an image associated with the respective metada- 
ta object. The body of each element contains the infor- 
mation itself, for example there are person elements in 
Figure 4 that mark the names of Bill Clinton and Nelson 
Mandela. The metadata object elements mark text- 
based descriptions of objects that appear in the images 
while the metadata tracks provide an index to the subset 
of images of a clip in which the particular metadata ob- 
ject associated with the metadata track features. The 
UMID is Included as a child element of the metadata 
tracks. The advantage of explicitly providing an index to 
the subset of images in which an object appears is that 
rather than downloading an entire video clip with which 
the object is associated, only the subset of images and 
the associated audio In which the metadata object ap- 
pears need be downloaded from the video store. This 
reduces download time and saves bandwidth. The full 
clip can also be downloaded if so required. 
[0024] Figure 5 shows the hierarchical structure of the 
XML metadata file of Figure 3. The media tag 200 is at 
the top level of the hierarchy. The metadata objects 220 
and the metadata tracks 21 0 are both child elements of 
the media element 200. Each metadata object has a cor- 
responding metadata track partner. This is illustrated by 
the person element 230A which corresponds to the per- 
son track 230B. The UMID elements 240 are at the low- 
ermost level of the hierarchy In this case. 
[0025] The fact that the interchange between the cli- 
ent and the database is in XML provides advantages 
over the prior-art systems. In particular, the XML inter- 
face between client and database allows complex que- 
ries to be constoicted using XML query language. The 
software interfaces between the client and the rrietas- 
tore are independent of the particular data schema used 
by the customer which means that the customer has the 

- freedom to design and use his own specific business 
schema in conjunction with the video material database 

• according of the invention. The video information re- 
trieval system of the present invention also allows for 
easy integration of proprietary video content-extraction 
tools and database systems from other vendors. 

V [0026] The XML file. 155 will include URLs for low 

' bandwidth and full bandwidth versions of the video clips. 
The user may require full bandwidth video material for 
use with high-end equipment or to include in a television 
broadcast Low bandwidth video material may be re- 
quired by the user for viewing on low-end equipment for 
editing purposes or for transmission across computer 
networics. The XML file will also provide links to still Im- 
ages such as the representative keystamp (RKS) imag- 
es for each of the video clips highlighted by the search 



query. The RKS Images are located by a CGI script host- 
ed by a web server whrch takes the* UMID and the time- 
code as parameters. 

[0027] The XML file is converted to HTML and dis- 
5 played in the client's browser. The user at the client com- 
puter makes a decision as to which video material to 
download on the basis of the metadata provided. To 
download the video material the user initiates a client 
request 165 which is directed to the appropriate video 
10 server using the URL and UMID information contained 
In the XML file 155. 

[0028] Although the metadata can be stored in the da- 
tabases 1 30 in any format, because the exchange of da- 
ta between the databases 130 and the client 100 is in 

IS XML, it may also be convenient to store metadata in hi- 
erarchical formats in the databases 130 using XML. The 
databases 130 could use an object database to store 
the XML metadata files. The hierarchical stmcture of 
XML means that it is more efficient to store complex 

20 XML files In an object database rather than a relational 
database. The XML is parsed into object structures prior 
to being stored in the object database. The use of the 
object database has the advantage that the information 
is stored in a format which makes It easy to access el- 

25 ements and attributes rapidly without the requirement of 
loading and parsing of a sequential file. 

Claims 

30 

1 . A video information retrieval system comprising: 
a client system having: 

35 means for issuing a search request in re- 

spect of desired video material; and 
means for accessing video material on the 
basis of a uniform resource locator (URL) 
and a SMPTE unique material identifier 

^ (UMID); 

a server system having: 

access to one or more databases contain- 
45 ing metadata Information relating to a plu- 

rality of video material items, a UMID as- 
sociated with each video material item and 
at least one URL associated with each 
UMID; 

50 means for receiving a search request from 

the client system and detecting one or 
more video material items for which meta- 
data information stored in at least one of 
the database(s) substantially corresponds 

55 to the search request; 

means for supplying the metadata informa- 
tion, the URL and the UMID relating to the 
one or more detected video material itenns 
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to the client system; 

and at least one video repository having: 

a video storage arrangement storing video 
material and associated UMID data; 

in which the metadata, the URL and the UMID 
are communicated between the server and the cli- 
ent using a markup language having descriptors for 
data content. 

2. A system according to claim 1 , in which the search 
requests are communicated between the server 
and the client using a markup language having de- 
scriptors for data content. 

3. A system according to claim 1 or claim 2, in which 
the database stores metadata in a hierarchical rep- 
resentation using a markup language having de- 
scriptors for data content. 

4. A system according to any one of claims 1 to 3, in 
which the markup language is an extensible markup 
language (XML). 

5. A system according to any one of the preceding 
claims, in which the client and the server communi- 
cate via http port 80. 

6. A system according to any one of the preceding 
claims, in which the server system is operable to 
supply URLs to the client system for accessing the 
video material in a broadcast-quality representa- 
tion. 

7. A system according to any one of the preceding 
claims, in which the server system is operable to 
supply URLs lo. the client system for accessirig the 
video material in a sub-broadcast-quality represen- 
tation. 

8. A system according to any one of the preceding 
claims, in which the server system is operable to 
supply URLs and video timecodes to the client sys- 
tem for accessing single images representative of 
the content of the video material. 

9. A system according to any one of the preceding 
claims^ in which the server, the client and the video 
repository communicate via the world wide web. 

10. A video infomiation server having: 

access to one or more databases containing 
metadata information relating to a plurality of 
video material items, a SMPTE unique material 
identifier (UMID) associated with each video 



material item and a unifomi resource locator 
(URL) associated with each UMID; 
means for receiving a search request from a cli- 
ent system and detecting one or more video 
material Items for which metadata information 
stored in at least on of the database(s) substan- 
tially corresponds to the search request; 
means for supplying the metadata information, 
the URL and the UMID relating to the one or 
more detected video material items to the client 
system using a markup language having de- 
scriptors for data content. 

1 1 . A video information retrieval client system compris- 
ing: 

means for issuing a search request to a video 
information server system in respect of desired 
video material; 

means for receiving search results from the 
server system comprising at least a uniform re- 
source locator (URL) and a SMPTE unique ma- 
terial identifier-(UMID); and 
means for accessing video data from a video 
repository on the basis of the URL and the 
UMID data; 

in which the metadata, the URL and the UMID 
are communrcated between the server and the cli- 
ent using a maricup language having descriptors for 
data content. 

12. A method of video infomiation retrieval using a serv- 
er system having access to one or more databases 
containing metadata information relating to a plural- 
ity of video material items, a SMPTE unique mate- 
rial identifier (UMID) associated with each video 
material item and a URL associated with each 
UMID; 

the method comprising the steps of: 

a client system issuing a search request in re- 
spect of desired video material; 
the server system receiving the search request 
from the client system and detecting one or 
more video material items for which metadata 
information stored in at least one of the data- 
base(s) substantially corresponds to the search 
request; and 

the sen/er system supplying the metadata in- 
formation: the URL and the UMID relating to the 
one or more detected video material items to 
the client system using a mari<up language hav- 
ing descriptors for data content; 
the client system accessing video material on 
the basis of the uniform resource locator (URL) 
from a video repository having a video storage 
arrangement storing video material and asso- 
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ciated UMID data. 

13. A method ot video information retrieval, the nnethod 
being substantially as hereinbefore described with 
reference to Figures 2 to 4 of the accompanying s 
drawings. 

14. Computer software having program code for carry- 
ing out a method according to claim 1 2 or claim 13. 

w 

15. A data providing medium by which computer soft- 
ware according to claim 1 4 is provided. 

16. A medium according to claim 15, the medium being 

a transmission medium. is 

17. A medium according to claim 15. the medium being 
a storage medium. 
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or more video material items for which metadata infor- 
mation stored in at least one of the database(s) substan- 
tially corresponds to the search request; means for sup- 
plying the metadata infomriation, the URL and the UMID 
relating to the one or more detected video material items 
to the client system; and at least one video repository 
having: a video storage arrangement storing video ma- 
terial and associated UMID data; in which the metadata, 
the URL and the UMID are communicated between the 
server and the client using a markup language having 
descriptors for data content. 
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